obs: [x, y, hint, button]
button feature means nothing to agent 1
button reveals hint if a0 already has it 

env1: a0 is already on the button and knows the hint. then a1 needs to seeit and goes to the right exit
env 2: a0 needs to find button and already knows the hint, presses it to show hint. then a1 needs to see it and goes to the right exit
env3: a0 needs to find button but doesn't know the hint. It needs to press the button to show hint for a1. a1 needs to see it and goes to the right exit
env4: unlike env3, buttons and hints for a0 are on the same side instead of opposite ends